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Abstract 

In  this  project  we  investigate  the  impact  of  the  maritime  environment  on  the  propagation  of  laser 
beams.  This  study  primarily  uses  data  collected  at  the  Naval  Academy  with  the  goal  of  quanti¬ 
fying  the  correlation  between  the  statistics  of  the  environmental  parameters  and  the  statistics  of 
laser  beam  intensity  at  the  target.  The  project  has  two  parts  to  it:  1)  we  present  a  computational 
analysis  of  different  probability  density  function  approximation  techniques;  and  2)  we  introduce 
preliminary  steps  towards  developing  a  stochastic  model  for  the  maritime  laser  beam  propagation. 
In  the  first  part  of  this  work  we  apply  three  mathematical  methods  to  construct  the  probability  den¬ 
sity  function  of  the  data:  i)  the  Kernel  Density  Estimator  (KDE)  method,  ii)  the  Barakat  Method 
using  lower-order  moments,  and  iii)  the  Bayesian  Mixture  Model.  We  compare  and  contrast  the 
features  of  the  three  approximation  techniques,  first  in  the  context  of  a  synthetic  data  whose  true 
pdf  is  known,  and  next  in  the  context  of  the  laser  data.  In  the  second  task,  we  analyze  how  a  com¬ 
plex  medium  causes  the  photons  of  the  laser  light  to  behave  differently  than  if  they  were  acting  in 
freespace,  by  focusing  on  the  stochastic  behavior  that  our  data  exhibits.  We  develop  a  stochastic 
paraxial  wave  equation  in  order  to  have  a  mathematical  model  capable  of  accepting  statistical  pa¬ 
rameters  from  the  atmosphere  as  input  to  allow  us  to  investigate  the  statistical  properties  of  light 
intensity  at  a  specified  target. 


Keywords:  laser  propagation,  data  analysis,  Bayesian  statistics,  probability  density  function. 
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Chapter  1 

Introduction  and  Motivation 


The  intensity  of  laser  light  plays  an  integral  role  in  many  naval  applications.  From  laser-based 
weapon  systems  to  determining  distances  with  Laser  Range  Finders,  the  Navy  has  much  to  gain 
from  increased  knowledge  in  this  field.  Philip  Nielsen,  in  Effects  of  Directed  Energy  Weapons,  [8], 
emphasizes  the  uses  for  lasers  in  naval  weaponry.  Andrews  and  Phillips,  Laser  Beam  Propagation 
Through  Random  Media,  [1],  develop  the  mathematical  framework  associated  with  laser  beams, 
and  we  intend  to  use  these  principles  in  an  effort  to  further  investigate  the  mathematical  properties 
of  laser  beams  in  the  maritime  domain.  Our  goal  by  the  end  of  the  project  is  to  better  understand 
and  quantify  the  properties  of  laser  light  by  analyzing  the  intensities  of  laser  light  after  the  light 
has  traveled  through  complex  media.  It  is  well-known  that  fluctuations  in  air  temperature,  as  well 
as  the  presence  of  aerosols  and  other  scatterers,  impacts  the  direction  of  propagation  of  a  beam  as 
well  as  the  amount  of  power  it  may  deposit  on  a  target  (see  Chapter  2  of  [1]). 

This  project  primarily  deals  with  the  mathematical  and  computational  analysis  of  data  collected 
from  a  Helium-Neon  laser.  The  data  we  analyze  has  already  been  collected  by  a  group  supervised 
by  Prof.  S.  Avramov-Zamurovic  of  the  Weapons  and  Systems  Engineering  Department.  The  data 
is  of  light  intensity  and  is  collected  from  a  field  experiment,  in  which  we  have  set  up  a  laser  and 
sensor  in  stationary  locations  on  Sherman  Field,  located  on  the  premises  of  the  Naval  Academy. 
Figure  2  shows  a  typical  stochastic  behavior  of  the  signals  in  the  data  and  our  goal  is  to  understand 
the  extent  of  the  contribution  of  the  atmospheric  conditions  to  the  stochastic  behavior  of  light 
intensity  in  signals  such  as  the  one  in  Figure  2.  To  that  end,  we  will  use  the  recorded  atmospheric 
parameters,  including  humidity,  precipitation,  temperature,  wind,  and  cloud  cover,  with  the  goal 
of  correlating  the  propagation  properties  with  the  effects  of  the  turbulence  in  the  medium.  Most 
of  the  laser  beam  data  we  have  is  from  a  target  that  is  stationed  375  meters  from  the  propagation 
source  (see  [6]  for  a  detailed  description  of  the  data  collection).  We  will  analyze  the  environmental 
diversity  in  data,  ranging  from  the  ambient  light  (day  or  night)  to  the  aerosol  conditions  (land  vs. 
water);  we  will  compute  the  statistical  parameters  of  the  data  in  order  to  construct  its  probability 
density  function,  with  the  goal  of  identifying  the  signature  of  the  atmospheric  turbulence  in  our 
data. 

The  sensor  that  detects  photons  of  light  is  capable  of  recording  intensities  of  the  laser  beam  at 
a  rate  of  10000  Hz.  That  rate  is  equivalent  to  taking  a  data  point  every  150  microseconds.  Data 
is  collected  for  approximately  three  minutes,  resulting  in  over  a  million  data  points  to  work  with 
for  each  data  set.  The  data  is  stored  electronically,  and  then  converted  into  a  vector  that  can  be 
interpreted  by  MAT  LAB.  Currently  over  100  such  data  sets  are  available  on  file  for  a  variety  of 
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different  atmospheric  scenarios,  equating  to  more  than  100  million  data  points  for  the  intensity  of 
laser  light.  See  Figure  1.1  for  an  example  of  a  time  series  plot  of  laser  data.  This  graph  shows  the 
intensity  of  a  recorded  laser  beam  at  a  single  pixel  on  a  sensor  located  375  meters  from  the  source. 
The  duration  of  the  time  series  is  180  seconds,  at  a  time  interval  of  150  fxs. 


Time  Series  Plot  of  HeNe  50dB  375m  Laser  Data  17 


0  20  40  60  80  100  120  140  160  180 

Time  (s) 


Figure  1.1:  Time  Series  Plot  of  Laser  Data. 


The  structure  of  this  thesis  is  as  follows:  In  Chapter  2  we  review  the  structure  of  a  laser  beam 
in  free  space.  In  Chapter  3  we  review  concepts  of  histograms  and  probability  density  functions, 
and  we  will  create  histograms  for  our  laser  data  in  Chapter  4.  In  Chapter  5,  the  method  of  Barakat 
will  be  introduced.  Chapter  6  is  dedicated  to  the  Kernel  Density  Method.  Chapter  7  consists  of  the 
computation  related  to  the  Gaussian  Mixture  Methods.  Chapter  8  provides  a  comparison  and  con¬ 
trast  of  the  three  methods  for  computing  a  pdf  using  synthetic  data.  Chapter  9  implements  these 
techniques  on  real  data  collected  in  the  maritime  domain.  Chapter  10  provides  an  introduction 
into  future  research  by  computing  a  solution  to  the  paraxial  wave  equation  that  with  a  stochastic 
refraction  coefficient.  Chapter  1 1  is  the  conclusion  section. 
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Chapter  2 

Laser  Beam  Propagation  in  Free  Space 


Light  propagation  is  governed  by  Maxwell’s  equations.  Let  {E.  B}  be  the  electric  and  magnetic 
field  pair  that  defines  light.  Then  in  a  region  with  no  charges  or  currents,  Maxwell’s  equations  are 


„  ^  <9B 

VXE=-^ 


„  „  1  dE 

VXB  =  C^W 


(2.1) 


V  •  E  =  0,  V  •  B  =  0.  (2.2) 

Here  c  is  the  speed  of  light  and  is  related  to  the  medium’s  permittivity  eo  and  permeability  /i0  by 

1 


c2  = - . 

eo/A> 

In  what  follows,  where  we  closely  follow  the  development  in  Chapter  4  of  [1],  we  will  show  that 
each  component  of  E  and  B  satisfies  the  wave  equation.  To  see  this  first  take  the  curl  of  the  second 
equation  in  (2.1)  and  use  the  identity  V  x  (V  x  E)  =  V(V  •  E)  —  AE,  and  the  first  equation  in 
2.2),  V  •  E  =  0,  to  get 

AE=^(VxB) 

Next,  use  the  first  equation  in  (2.1),  V  x  B  =  iff ,  to  eliminate  B  and  get  a  single  equation  for 

E"  AT?  -  1  d*E 

The  above  equation  is  the  well-known  and  standard  wave  equation. 

A  similar  development  shows  that  B  satisfies  the  same  wave  equation.  Since  the  equations  are 
linear,  we  conclude  that  each  component  of  either  E  or  B  satisfies  the  scalar  wave  equation 

1  d2u 

A“=?ar  (23) 

Next  we  look  for  special  solutions  of  the  wave  equation  (2.3),  namely,  monochromatic  solutions 
of  the  form 

u{r,t)  =  v(r)elut,  r  =  (x,y,z) 

After  substituting  the  above  template  into  (2.3),  we  see  that  v  satisfies  Helmholtz’s  equation 

cz 


Av  +  k2v  =  0, 


(2.4) 
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Finally,  we  look  for  beam-like  soluions,  i.e., 

v(r)  =  V{v)eikz. 

Note  that  the  function  V  in  (2.5  is  complex-valued, i.e,  V 
(2.5)  into  the  Helmholtz  equation  (2.4)  and  get 

.  d2V  n.,dV 
A±V  +  +  2ik—~ 

OZ1  oz 


(2.5) 

V\  +  i\ 2.  We  substitute  the  template 

0, 


where  A±  = 

One  of  the  main  assumptions  of  laser  beam  propagation  theory  (again,  see  Chapter  4,  [1])  is  the 
assumption  that  the  term  \yyyr\  is  much  smaller  than  \2ik^p\,  an  is  therefore  ignored  in  the  above 
equation,  resulting  in  the  partial  differential  equation 

dV 

+  2  ik—  =  0.  (2.6) 

oz 

The  above  equation  is  called  the  Paraxial  Wave  Equation  (PWE).  We  emphasize  again  that  this 
equation  is  complex-valued. 


2.1  Exact  Solution  of  PWE  in  half-space 


Continuing  to  follow  the  material  in  Chapter  4  of  [1],  we  now  present  the  exact  solution  of  (2.6) 
when  V  is  known  at  the  aperture  z  =  0.  It  turns  out  that  if  V (x.  y,  0)  is  Gaussian,  a  term  that  we 
will  define  precisely  in  the  next  section,  then  V (x.  y.  z)  remain  Gaussian  for  all  z:  Let 


V(x,y,  0)  =  Aexp(— 777^  + 


W2  2iV’ 


(2.7) 


where  Wq  and  F0  are  called  the  spot-size  radius  and  the  phase  radius  of  curvature  of  the  beam  at 
z  =  0,  then  (2.6)  has  the  exact  solution 


V(x,y,z)  = 


A 


exp(- 


r2  .  kr 2 


+  i(0- 


)), 


yXfT©!  v  v  2 F(zy 

where  A0,  called  the  refraction  parameter,  and  ©0,  the  diffraction  parameter,  are 


(2.8) 


.  1  z  .  2z 


(2.9) 


o 


and  W  and  F,  which  are  the  spot-size  radius  of  the  beam  and  its  phase  radius  of  curvature  at  any 
z,  are 

W(z)  =  W0yJ A20  +  &1  0  =  tan”1^  (2.10) 


F(z)  =  Fl 


(eg  +  Ag)(6>0  -  1) 
02  +  A2  -  e0) 


(2.11) 
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Figure  2.6  shows  a  typical  instance  of  the  solution  of  PWE.  In  this  figure  the  contours  of  inten¬ 
sity  (irradiance)  of  a  laser  beam  are  plotted  at  a  fixed  distance  z  for  the  aperture  -  the  intensity  of 
light  is  defined  as  \V\2.  As  expected,  the  beam  has  kept  its  gaussian  shape  far  from  the  aperture. 
This  feature,  that  a  beam  starting  out  as  a  gaussian,  and  remaining  gaussian  for  all  values  of  z  is 
a  special  feature  of  beams  propagating  in  vacuum  or  free  space.  In  the  problems  that  we  take  up 
in  this  paper,  the  laser  beams  that  propagate  in  the  maritime  domain  must  interact  with  the  atmo¬ 
sphere  and  consequently  the  gaussian  nature  of  the  beam  is  destroyed.  One  of  the  main  goals  of 
this  paper  is  to  attempt  to  quantify  the  impact  of  the  atmosphere  on  a  laser  beam  based  on  the  data 
we  collect  from  various  experiments.  The  next  chapter  introduces  some  of  the  mathematical  tools 
we  need  for  this  quantification. 

Figure  2.6  is  obtained  by  executing  the  following  program  in  MATLAB: 


lambda=633* 10  A-9;  k=2  *pi/ lambda; F0  =  500;W0  =  0.03; 

ThetaO  =  inline (' 1-z/FO' , '  z'  ,  'F0' )  ; 

Lambda 0  =  in line ('2*z/k/W0A2'  ,  '  z '  ,  '  k'  ,  'WO' )  ; 
z=0 : 1200 ; 

TH=ThetaO ( z , F0 ) ;  LA=LambdaO ( z , k, WO ) ; 

W=W0  *  sqrt (TH. A2+LA. A2)  ; 
figure ( 1 ) 
plot ( z , W) 

title  ('Graph  of  W' ) 

F=F0* (TH. A2+LA. A2) . * (TH-1) . / (TH. A2+LA. A2-TH) ; 
figure (2 ) 
plot  (z, F) 

title ('Graph  of  F' ) 

[X,  Y] =meshgrid (-0.05:0.001:0.05,-0.05:0.001:0.05); 
z=1200 ; 

TH=Theta0  ( z , F0 ) ;  LA=LambdaO ( z , k, W0 )  ; 

W=W0  *  sqrt (TH. A2+LA. A2)  ; 

Irradiance=l . / ( TH . A2+LA . A2 ) . *exp (-2*(X.A2+Y.A2) ./W.A2); 
figure ( 3 ) 

[c,  h] =contour (X, Y, Irradiance) 

clabel (c, h) 

colorbar 

tit le  (' Contours  of  Irradiance  at  z  =1200') 
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Contours  of  Irradiance  at  z  =1200 


Figure  2.1:  Contours  of  the  intensity,  i.e.,  V2  +  V22  at  a  fixed  distance  (z  =  1200  meters)  from  the 
aperture.  This  figure  is  obtained  with  parameter  values  A  =  633  nanometers,  F0  =  500  meters, 
Wq  =  0.03  meters. 
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Chapter  3 

Review  of  Histogram  and  PDF  Concepts 


3.1  Histograms 

We  begin  by  analyzing  the  data  using  histograms.  Histograms,  arguably  the  most  elementary 
method  of  analyzing  data  points,  display  the  frequency  of  occurrence  of  a  random  variable  in  an 
interval  or  bin.  See  Figure  3.1  for  an  example  of  a  histogram,  in  which  we  take  a  sample  of  400 
points  from  a  normal  distribution  with  mean  0  and  standard  deviation  1.  We  create  ten  equally 
spaced  bins  and  place  the  points  in  the  corresponding  bins  to  create  our  histogram.  The  MATLAB 
code  used  to  create  the  histogram  in  Figure  3.1  is  stated  below: 

>>  data=randn (400, 1) ; 

>>  hist (data, 10 ) 

>>  title (' Normally  Distributed  Random  Numbers') 


Normally  Distributed  Random  Numbers 


Figure  3.1:  Example  of  a  histogram. 


In  addition  to  being  easy  to  create,  histograms  are  not  computationally  strenuous.  For  example, 
we  will  see  with  later  methods  that  it  is  computationally  impossible  to  use  all  the  data  points  of  a 
data  set,  while,  with  histograms,  we  can  use  all  the  data  points.  Future  methods  will  require  us  to 
create  functions  for  each  data  point  and  sum  the  functions  together.  The  summation  can  be  com- 
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putationally  difficult  if  the  sample  size  of  our  data  is  significantly  high.  However,  with  histograms, 
the  points  are  simply  placed  into  pre-allocated  bins,  which  is  not  computationally  difficult.  By 
using  all  the  data  points  in  our  data  set,  we  ensure  that  we  are  not  ignoring  any  crucial  data  points. 

The  bin  size  corresponds  to  the  width  of  each  bar  in  the  histogram.  Perhaps  the  easiest  way  to 
understand  bins  is  with  an  example.  Consider  the  ages  of  100  random  individuals  who  are  walking 
into  a  Wal-Mart,  given  below — synthetic  Wal-Mart  data  was  generated  from  a  uniform  distribution 
of  integers  from  0-99. 
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A  bin  size  of  10  would  allow  us  to  focus  on  the  variation  of  customers  whose  ages  range  within 
a  ten  year  period.  For  example,  the  first  bin  takes  into  account  all  customers  between  the  ages  of  0 
and  9,  the  second  bin  those  whose  ages  range  between  10  and  19,  and  so  forth.  Figure  3.2  shows 
the  histogram  of  the  data: 


Ages  in  Wal-Mart 


Figure  3.2:  Histogram  of  Ages  with  10  Bins. 


It  is  easy  to  see  from  the  histogram  that  we  had  17  people  in  the  age  range  of  0-9  years  old,  and 
6  people  in  the  age  range  of  40-49  years  old.  If  we  change  the  number  of  bins  to  fewer  than  ten, 
we  lose  some  of  the  details  of  the  variability  that  the  data  carries.  Figure  3.3  shows  the  histogram 
when  we  use  only  two  bins: 
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Ages  in  Wal-Mart  with  2  Bins 


25  75 


Figure  3.3:  Histogram  of  Ages  with  2  Bins. 


Similarly,  if  we  have  too  many  bins,  we  can  lose  meaningful  information  contained  in  the  data. 
In  Figure  3.4,  for  example,  there  are  holes  in  the  histogram  that  depict  a  0%  probability  of  an  indi¬ 
vidual  walking  into  Wal-Mart  between  the  ages  of  16  and  18.  However,  we  know  that  the  data  was 
pulled  from  a  uniform  distribution.  Therefore,  there  should  be  just  as  much  chance  as  seeing  an 
individual  walk  into  Wal-Mart  between  the  ages  of  16-18  and  18-20.  Hence,  we  need  to  decrease 
the  bin  size  in  order  to  plot  a  more  accurate  histogram  for  the  synthetic,  uniformly  distributed, 
Wal-Mart  data. 


Figure  3.4:  Histogram  of  Ages  with  50  Bins. 


What  we  should  see  from  the  Wal-Mart  example  is  that  we  need  a  sufficiently  large  number  of 
bins  with  our  histograms  in  order  to  create  a  histogram  that  is  well  represented  by  the  data,  but  not 
make  the  number  so  large  that  we  lose  the  overall  shape  that  the  histogram  should  display.  We  will 
do  the  same  process  with  our  Laser  Data  later  in  Chapter  3. 
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3.2  Normal  "Gaussian"  Probability  Density  Function 

A  probability  density  function  (pdf)  is  a  function  used  for  determining  the  probability  of  a  contin¬ 
uous  random  variable;  particularly  the  probability  that  a  certain  data  point  will  fall  between  two 
values  (a  and  b).  In  order  to  determine  the  probability,  we  compute  the  area  under  the  curve  of  the 
pdf,  or  the  integral  from  a  to  b.  That  is,  a  probability  density  function  f(x)  is  defined  as, 


P(a  <  x  <  b)  =  /  f(x)dx. 
J  a 


By  definition,  the  area  under  any  pdf  must  equal  1.  Which  means 

POO 

/  P(x)dx  =  1 


(3.1) 


The  cumulative  density  function  (cdf)  is  used  when  comparing  different  pdfs  using  the  Kolmogorov- 
Smironov  (K-S)  Test  or  Least  Squares  Test  (described  in  Chapter  7).  The  cdf,  F(x)  is  defined  as, 


F(x) 


(3.2) 


Note  that  liinT_>._00  F(x)  =  0  and  Hindoo  F(x)  =  1. 


The  Gaussian  pdf  is  perhaps  the  most  commonly  used  pdf  in  statistics  and  physical  applica¬ 
tions.  The  Gaussian  pdf  is  sometimes  colloquially  referred  to  as  the  "bell  curve"  based  on  its  shape. 
The  Gaussian  pdf  has  two  arguments:  the  mean  (/i)  and  standard  deviation  (a),  and  is  given  by  the 
following  equation: 

1  (x  —  /j.)2 

f(x)  =  — 7=e  (3.3) 

crv2vr 

The  standard  Gaussian  pdf  has  n  —  0  and  a  —  1.  See  Figure  3.5. 


Gaussian  PDF  with  p.=0  and  o=1 


Figure  3.5:  Gaussian  PDF  with  /i  =  0  and  a  —  1. 
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Let  us  note  that  /i6l  and  a  e  M+.  We  should  also  note  that  a  change  in  fi  corresponds  to  a 
horizontal  shift,  or  translation,  of  the  pdf.  In  addition,  a  change  in  a  corresponds  to  a  horizontal 
stretch  of  the  pdf.  See  Figures  3.6  and  3.7  as  a  reference: 


Gaussian  PDF  ji  Comparison 


Figure  3.6:  Comparison  of  Different  Values  for  /i. 


Gaussian  PDF  o  Comparison 


Figure  3.7:  Comparison  of  Different  Values  for  a. 


We  will  use  the  Gaussian  pdf  extensively  throughout  the  remainder  of  the  paper  in  various  ways 
to  compute  approximate  data-driven  pdfs.  Both  the  Kernel  Method  and  Gaussian  Mixture  Method 
use  the  Gaussian  pdf  as  a  basis  for  the  approximate  pdf. 
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Chapter  4 

Laser  Data  Conveyed  Through  Histograms 


For  this  paper  we  will  rely  on  two  data  sets,  one  collected  during  the  daytime  and  the  other  at 
nighttime.  The  first  data  set  that  we  will  consider  will  be  nighttime  data.  The  data  was  taken  in 
the  middle  of  the  night,  when  there  was  little  to  no  light  interfering  with  the  laser  beam.  We  will 
use  this  as  our  control  for  other  sets  of  data.  The  other  data  set  we  will  use  is  daytime  data,  which 
was  collected  while  the  sun  was  actively  impacting  the  laser  beam.  The  data  was  collected  from 
a  stationary  sensor,  located  375  meters  away  from  the  laser  aperture,  that  recorded  scalar  intensity 
values  at  a  rate  of  10  kHz  for  three  minutes.  The  wavelength  of  the  laser  beam  is  633  nm. 


4.1  Analysis  of  Nighttime  Data  Using  Histograms 

We  will  begin  by  looking  at  a  histogram  with  only  10  bins.  See  Figure  4.1.  We  can  see  that  the 
histogram  above  does  not  reveal  much  information  about  the  data  set  other  than  giving  us  the  upper 
and  lower  bounds.  Therefore,  we  will  increase  the  number  of  bins  by  an  order  of  magnitude.  The 
resulting  histogram,  shown  in  Figure  4.2,  reveals  more  information  about  the  data,  but  still  does 
not  depict  the  features  that  some  of  the  histograms  with  smaller  bin  sizes  are  able  to  depict. 


Histogram  for  Black  Data  with  10  Bins 


Figure  4.1:  Histogram  of  Black  Data  with  10  Bins. 
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Figure  4.2:  Histogram  of  Black  Data  with  100  Bins. 


Figure  4.2  depicts  the  overall  skewed  nature  of  the  pdf,  but  with  over  a  million  points  of  data, 
we  can  increase  the  number  of  bins  by  another  order  of  magnitude  in  order  to  improve  the  shape 
of  our  histogram.  The  resulting  histogram,  as  seen  in  Figure  4.3. 


Histogram  for  Black  Data  with  1000  Bins 


Figure  4.3:  Histogram  of  Black  Data  with  1,000  Bins. 


The  bins  are  so  narrow  now,  that  we  can  no  longer  distinguish  well  the  lines  that  separate  bins. 
We  should  note  that  the  current  histogram  exhibits  significantly  more  features  than  the  histogram 
with  10  bins.  For  example,  we  can  now  see  two  local  maximums  in  our  histogram  plot  with  1,000 
bins  that  we  could  not  see  with  only  10  bins.  To  ensure  that  no  other  local  maximums  occur  in  the 
histogram  plot  of  the  data,  we  should  look  at  another  histogram  with  10,000  bins.  See  Figure  4.4 
for  histogram: 


Histogram  for  Black  Data  with  10000  Bins 


Figure  4.4:  Histogram  of  Black  Data  with  10,000  Bins. 
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Note  that  there  is  not  much  of  a  difference  between  the  histogram  with  1,000  bins  vs.  the  his¬ 
togram  with  10,000  bins.  Therefore,  we  will  use  the  histogram  with  1,000  bins  for  our  analysis 
later  in  the  paper. 

The  following  is  the  MAT  LAB  code  used  to  create  the  above  histograms  (example  for  100  bins): 

>>  laser=ConvertedDat a . Data . MeasuredData ( 1 ,  4). Data; 

>>  hist  (laser, 100) 

>>  tit le (' Histogram  for  Black  Data  with  100  Bins') 


4.2  Daytime  Laser  Data  Analysis  Using  Histograms 

The  Daytime  Laser  Data  was  taken  during  the  daytime  with  the  same  laser  and  sensor.  The  dis¬ 
tances  are  the  same  (375  meters).  We  will  now  do  the  same  process  for  finding  the  best  histogram 
that  we  did  for  the  Black  Data.  First,  let’s  look  at  the  data  under  10  bins.  See  Figure  4.5  for 
histogram: 


Histogram  for  Daytime  Data  with  10  Bins 


Figure  4.5:  Histogram  of  Daytime  Laser  Data  with  10  Bins. 


Again,  we  can  see  that  with  only  10  bins,  we  cannot  conclude  much  about  the  shape  of  the 
probability  density  function  we  would  like  to  find.  Outside  of  determining  upper  and  lower  bounds, 
we  cannot  obtain  much  from  this  histogram.  Hence,  we  should  increase  the  number  of  bins  by  an 
order  of  magnitude.  See  Figure  4.6  for  histogram: 


Figure  4.6:  Histogram  of  Black  Data  with  100  Bins. 
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We  can  see  a  much  better  trend  with  100  bins,  but  much  of  the  variance  in  the  data  is  still 
hidden  with  the  smaller  number  of  bins.  Therefore  we  will  continue  to  increase  the  number  of  bins 
by  another  order  of  magnitude.  See  Figure  4.7  for  histogram: 


Figure  4.7:  Histogram  of  Daytime  Data  with  1,000  Bins. 


Note,  with  1,000  bins  we  can  see  spikes  on  the  right-hand  side  of  our  histogram.  These  spikes 
may  make  it  more  difficult  to  approximate  the  curve  for  a  probability  density  function  in  the  future. 
Therefore,  we  will  use  a  different  histogram  to  base  our  analysis  off  of.  By  increasing  the  order  of 
magnitude  again,  we  see  what  our  histogram  is  still  well  below  the  number  of  bins  that  it  would 
take  to  lose  the  trends  of  the  data  set.  See  Figure  4.8  for  histogram: 


Histogram  for  Daytime  Data  with  10000  Bins 


Figure  4.8:  Histogram  of  Daytime  Data  with  10,000  Bins. 


We  can  see  that  in  this  histogram,  the  spikes  are  no  longer  as  prevalent  in  the  right-hand  side 
of  the  histogram.  We  can  see  a  substantial  amount  of  detail  in  the  data,  and  therefore,  we  will  use 
this  histogram  as  a  basis  for  our  future  probability  density  function. 
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Chapter  5 

Barakat  Method  for  Computing  PDFs 
Using  Lower-Order  Moments 


The  concept  of  computing  probability  density  functions  for  laser  light  intensities  has  attracted  the 
attention  of  numerous  researchers.  One  of  these  researchers,  Richard  Barakat,  proposes  a  method 
that  incorporates  Gamma  functions  and  the  Associated  Laguerre  Polynomials  to  create  a  unimodal 
probability  density  function  for  a  laser  data  set  [2].  Barakat’s  method  for  computing  pdfs  using 
lower-order  moments  can  be  described,  below,  by  W(h),  where  Wn  are  the  mixing  weights,  LN 
are  the  Associated  Laguerre  Polynomials,  and  Wg{h)  is  the  Gamma  pdf: 


where, 


and, 


and 


and 


W{h)  =  Wg(h )  ^  WnL^' 

N= 0 


Wg(h) 
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a 

h /3_1  exp(— Ph/n), 
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<  h  >2 

<  h2  >  -  <  h  >2’ 
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E 

n= 0 


fN  +  P  -  1\  (-x)n 
\  N  —  n  J  n\ 


N 


WN  = 


n\ m  E 


- PM"(h "> 


^  n\(N  —  n)\T(f3  +  n) ' 


(5.1) 

(5.2) 

(5.3) 

(5.4) 

(5.5) 


Barakat  argues  that  the  first  five  terms  of  h  are  sufficient  to  approximate  the  PDF  of  h  reason¬ 
ably  well  [2].  In  addition,  Barakat  asserts  that  the  first  three  terms  are  independent  of  whatever  data 
set  we  look  at.  Particularly,  W0  =  l,Wi  =  W2  =  0.  We  will  prove  his  assertion  of  the  first  three 
terms,  and  attempt  to  evaluate  his  assumption  for  the  first  five  terms  for  laser  data  that  exhibits 
significant  noise  once  propagated  in  the  maritime  domain. 
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Definition:  <  h  >  is  the  arithmetic  mean  of  the  set  h.  That  is, 


<  h  >= 


hi  +  +  h$  +  •••  +  hn 

n 


(5.6) 


Theorem:  W0  —  1,  Wi  —  W2  —  0. 


Proof.  Show  Wo  —  1,  W\  —  W2  —  0. 
We  start  with  the  equation, 


N 

wn  =  n\t(p)J2 

n= 0 


(-/ 3//i)n  <  hn  > 
n\{N  -  n)\T(P  +  n) 


Let  N  =  0. 


Since  <  h°  >=<  1  >=  1, 
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Since  <h>—  n. 


Since  pr(p)  =  T{P  +  1), 


Wx  = 
Wx 


r(/3  + 1) 

rcs  +  i) 


Let  N  =  2. 


W2  =  2!r(j3)  y  (-Pl^n<hn> 

2  y()n\(2-n)\r{(3  +  n) 

After  expanding  the  summation, 

m  =  2\T(f3)  (  (-/V^)0  <  h°  >  +  (-/V/L)1  <  hl  >  +  (-/3//i)2  <  h2  > 


0! (2  —  0)!r(/3  +  0)  1!(2  -  l)!r(/3  + 1)  2!(2  —  2)!T(/3  +  2) 


w2 = 2m i  +  P&dCC  +  Wm)2 < h 2 > 


(l)(2)r(/3)  (l)l!r(j3  + 1)  (2)0!r(/3  +  2) 

Using  identities  from  previous  derivations  and  simplifying, 

w2  =  2m  i  +  m)2  <h2> 


2T(P)  r(/3  + 1)  2r(/3  +  2) 


Distributing  2T(j3)  through, 


w  =  2 m  _  wm  +  (/3/j,)2  <  e  >  2m 


2m  r(/3  +  i) 

Since  r(/3  +  2)  =  +  l)r(/3). 
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See  claim  below  for  further  proof: 
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/?  <h2  >= 
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<  h2  >  — /r2 
ji2  <  h2  > 


<  h2  >  -/i2 


□ 


For  our  MAT  LAB  code,  we  will  let  /  =  Then  our  functions  for  the  lower-order  pdf 


described  by  Barakat  become: 


W(I)  =  Wg(I)[  1  +  ^  WNL0-\pi)} 


N= 3 


Where, 

>UT)  =  m 

And 

N 

wN  =  mr(p)Y, 

n= 0 

-PT(in) 


We  define  Wg  in  MAT  LAB  with  the  following  code: 

function  [  W_g2  ]  =  Barakat_W_g2 (  beta,  I  ) 

W_g2  =  l/gamma (beta) * (beta) Abeta*I . A (beta-1)  . *exp  (-beta*I) ; 
end 

We  define  Wjy  in  MATLAB  with  the  following  code: 

function  [  W  ]  =  Barakat_W_n2 (  beta,  I,  N) 
ww=0 ; 
for  n=0 : N 

term= (-beta) An*mean (I . An)  . /  (factorial (n) * . . . 

factorial (N-n) *gamma (beta+n) ) ; 
ww=ww+term; 
end 

W=factorial (N) *gamma (beta) *ww; 
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end 

And  we  define  (x)  in  MATLAB  with  the  following  code: 

function  [  L  ]  =  Laguerre_Beta (  beta,  x,  N  ) 

L=0 ; 

for  n=0 : N 

L_term=gamma_nchoosek (N+beta-1 , N-n) * (-x) .^n/factorial (n) ; 
L=L+L_term; 
end 
end 

Using  the  functions  defined  above,  we  can  write  a  script  to  execute  W (/)  to  generate  an  ap¬ 
proximation  to  probability  density  function  for  our  data.  W (/)  can  be  executed  with  the  following 
code  (see  Figure  5.1  for  execution  of  code) 

x=%imported_data; 
x=sort (x) ;  I=x/mean (x) ; 

beta=mean (x) A2 / (mean (x . A2 ) -mean (x) A2 ) ; 

NN=5 ;  WW=0 ; 
for  N=3 : NN 

term=Barakat_W_n2 (beta,  I,  N) . *Laguerre_Beta2 (beta,  beta*I,  N) ; 
WW=WW+term; 
end 

W=Barakat_W_g2 (beta,  I) .* (1+WW) ; 
plot (x, abs (W) /trapz (x, abs (W) ) ) 


Figure  5.1:  Barakat  Method  Implemented  on  Synthetic  Data. 


Again,  by  visual  inspection,  the  Barakat  Method  appears  to  represent  the  data  well.  Once  we 
convert  the  pdf  into  a  cdf  and  perform  the  K-S  Test  and  Least  Squares  test,  we  can  see  that  the 
Barakat  Method  also  represents  the  data  better  than  the  Gaussian  CDF  (See  Figures  8.4  &  8.5). 
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Chapter  6 


Kernel  Density  Method  for  Computing 
PDFs 


A  second  approach  to  computing  pdfs  is  the  Kernel  Density  Method  (Reference  [9]  has  an  intro¬ 
duction  to  the  method),  which  we  will  apply  to  compute  the  pdf  of  our  laser  data. 


Given  a  time  series  x  =  x,  and  a  kernel  function  K(x),  we  define  an  approximation  Pk(x)  of 
the  pdf  by: 

1  N 

Pk(x)  = —^Gn,  (6.1) 

i=n 

where  Gn  is  a  pdf  of  a  distribution  centered  at  the  /7-th  term  of  the  time  series.  In  our  work,  we 
choose 

K[x)  =  e~x\  (6.2) 

and  each  Gn  will  be  a  Gaussian  pdf  centered  at with  mean  //,  =  xt  and  standard  deviation 


2A, 

where  Ax  =  Xrnax  —  Xmm  and  N  is  the  number  of  elements  in  the  time  series.  Hence, 


(6.3) 


Gn(x) 


-(x  —  xn) 

2a1 


2 


? 


(6.4) 


becomes  the  basis  for  our  Kernel  Method. 


Note,  that  in  the  general  Kernel  Method,  the  standard  deviation  is  a  parameter  to  be  determined 
by  the  user  to  obtain  the  best  estimate  of  the  pdf.  From  our  experience  with  laser  data,  we  have 
found  that  6.3  gives  us  an  approximation  of  the  pdf  that  is  comparable  to  the  pdfs  we  obtain  through 
different  methods  of  the  same  data. 

For  all  n  G  N  we  will  plot  each  Gn  and  then  sum  the  plots  together  to  create  a  curve  that  has 
area  N.  In  order  to  turn  this  curve  into  a  probability  density  function,  we  will  scale  the  curve  down 
by  factor  N.  This  will  ensure  that  we  have  area  equal  to  1  under  our  curve. 
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Figure  6.1:  Pictorial  Representation  of  the  Kernel  Method. 


y=0; 

data=[-3  023]; 

for  n=l:4 

mu=data (n) ; 

x=linspace (-6, 6, 1000) ; 

f=l/sqrt (2*pi) *exp (- (x-mu) . "2 . / 2 ) ; 

hl=plot (x, f , ' r ' ) ; 

hold  on 

y=f+y; 

end 

h2=plot (x, y, 'b  * ) ; 
h3=plot (x, y/4, ' g' ) ; 

legend ( [hi, h2, h3] ,' Individual  Gaussian  Curve  from  Data',... 

'Sum  of  Gaussian  Curves 'Normalized  Sum' , 'Location' , 'Northwest ' ) 


Figure  6.1  is  a  visual  representation  of  the  Kernel  Method,  which  is  the  output  of  the  MAT  LAB 
code  displayed  above. 

6.1  Kernel  Method  Implemented  on  Synthetic  Data 

Before  we  implement  the  Kernel  Method  on  our  laser  data,  we  will  first  implement  the  method  on 
a  synthetic  data  set  in  which  we  know  the  true  probability  density  function.  For  our  synthetic  data, 
we  will  use  10,000  points  selected  from  a  normal  distribution  with  /i  =  1  and  a  =  0.1.  We  will 
use  the  following  MAT  LAB  code  to  run  the  Kernel  Method  on  our  synthetic  data.  See  Figure  6.2 
for  the  output  of  the  following  code 

mu=l;  sigma=.l;  x=sigma*randn ( 10000 , 1 ) +mu; 
ksigma=2  *  (max  (x)  -min  (x )  )  /  (size(x,l)A(l/2)); 
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x_axis=l in space (min (x) -3*ksigma, max (x) +3*ksigma, 20000 ) ; 
kernel=zeros (size (x_axis) ) ; 
for  n=l : length (x) 
kmu=x (n) ; 

f =1  /  ksigma  / sqrt  ( 2 *pi )  *exp  ( -  (x_axis-kmu )  .  A2  .  / 2  /ksigma/x2 )  ; 
kerne l=f +kernel ; 
end 

plot (x_axis, kernel/length (x) ,  '  g'  ) ; 


Figure  6.2:  Kernel  Method  Implemented  on  Synthetic  Data. 


From  a  strictly  visual  analysis,  we  can  see  that  the  Kernel  Method  approximates  the  curve 
remarkably  well.  In  order  to  obtain  a  more  objective  comparison  between  the  two  pdfs,  we  will 
apply  the  Kolmogorov-Smimov  (K-S)  Test,  the  Least  Squares  Test,  and  compute  the  Hellinger 
Distance  in  Chapter  8. 
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Chapter  7 

Gaussian  Mixture  Method 


The  Kernel  Method  makes  intuitive  sense  when  attempting  to  approximate  a  pdf.  However,  we  run 
into  computational  problems  when  we  deal  with  data  sets  that  have  substantially  large  numbers 
of  points.  For  example,  creating  a  probability  density  function  for  a  laser  data  set  results  in  the 
summation  of  over  a  million  Gaussian  curves.  In  order  to  decrease  the  computational  issues,  we 
will  implement  the  Gaussian  Mixture  Method  (GMM)  (see  [11]  for  an  introduction  to  the  method). 

The  Kernel  Method  is  a  specialized  case  of  GMM,  in  which  the  number  of  mixture  components 
is  equal  to  the  number  of  data  points  in  the  data  set,  and  the  mixture  weights  are  uniform.  The 
generalized  GMM  can  be  described  as: 


K 

PGMM(x)  =  ^  ®nP(Pn,  CTn),  (7.1) 

n— 1 

where  K  is  the  number  of  mixture  components,  or  clusters,  <I>n  is  the  n-th  mixing  weight,  and 
p(pn,  o~n)  is  the  n-th  associated  Gaussian  Curve. 

We  begin  by  selecting  how  many  clusters  we  would  like  to  use  in  our  model.  We  then  calculate 
(in  by  randomly  placing  K  points  along  the  domain  of  the  data  points.  We  invoke  a  machine 
learning  technique  that  iteratively  reassigns  the  positions  of  pn  according  to  the  formula: 

N 

<7-2> 

3= 1 

where  Xj  are  the  data  points  in  which  \xj  —  is  a  minimum  Vn  E  K .  While  the  iterative 

sequence  will  always  stabilize  to  a  solution,  the  solution  is  not  unique.  The  solution  depends  upon 
the  initial  distribution  of  the  fin.  Once  we  have  the  "centers"  for  the  Gaussian  components,  we 
assign  a  standard  deviation  to  each  cluster  by: 

ma x{xj  :jeN }  -  min-^  :  j  E  N} 

o~k  — - ^ - ’  G-^) 

The  mixing  weights,  <Tn,  is  the  ratio  of  the  number  of  data  points  closest  to  fin  to  the  total  number 
of  data  points  in  the  data  set.  The  weights  are  chosen  so  that  J^n=i  =  1  for  all  possible  $n. 
This  ensures  that  Pc  mm  (x)dx  =  1. 
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We  then  implement  the  GMM  on  the  synthetic  data  pulled  from  a  normal  distribution  (see  Figure 
7.1).  See  Appendix  A  for  MATLAB  code  that  was  used  to  generate  the  figure. 


0.6  0.7  0.8  0.9  1  1.1  12  1.3  1.4 


Figure  7.1:  Gaussian  Mixture  Method  Implemented  on  Synthetic  Data  with  13  Clusters. 


Again,  from  a  strictly  visual  analysis,  we  can  see  that  GMM  approximates  the  histogram  curve 
remarkably  well.  In  order  to  obtain  a  quantitative  comparison  between  the  two  pdfs,  we  will  apply 
the  Kolmogorov-Smirnov  (K-S)  Test,  the  Least  Squares  Test,  and  compute  the  Hellinger  Distance 
as  described  in  Chapter  8. 
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Chapter  8 

Description  of  Comparison  Methods  and 
Implementation  of  PDF  Approximation 
Techniques  on  Synthetic  Data 


In  this  section  we  will  describe  and  implement  the  three  pdf  approximation  techniques  on  syn¬ 
thetic  data  and  compare  the  results  using  the  K-S  Test,  Least  Squares  Test,  and  by  computing  the 
Hellinger  distance. 


8.1  Kolmogorov- Smirnov  (K-S)  Test 

The  Kolmogorov-Smirnov  Test  yields  a  scalar  value  known  as  the  Kolmogorov-Smirnov  Statistic. 
For  a  given  data  set,  we  compute  the  Empirical  Cumulative  Distribution  Function  (ECDF)  and  test 
it  against  a  proposed  cumulative  distribution  function  (cdf).  The  lower  a  K-S  Statistic,  the  better 
the  fit  for  the  proposed  cdf.  The  K-S  Statistic,  Dn  can  be  computed  by  the  following  equation: 

Dn  —  sup  \Fn{x)  —  F(x)\,  (8.1) 


and 

1  n 

Fn{x)  =  -  <  x),  (8.2) 

i=l 

where  I (xi  <  x)  is  the  Indicator  function,  and  F{x)  is  the  proposed  cdf.  The  function,  Fn(x)  is  the 
ECDF  from  our  data.  Since  the  ECDF  is  a  continuous  function  on  a  closed  and  bounded  interval, 

Dn  =  max  | Fn(x)  —  F(x)  |.  (8.3) 

X 

The  K-S  Statistic  is  solely  dependent  upon  the  "worst"  point  in  the  ECDF.  See  Figure  8.1  for  visual 
representation  of  K-S  Test. 
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Example  of  K-S  Test 


Figure  8.1:  K-S  Test  Implemented  on  50  Synthetic  Data  Points  from  Gaussian  Distribution. 


8.2  Least  Squares  Test 

The  Least  Squares  Test  yields  a  scalar  value,  Ln  that  can  compare  the  quality  of  the  fit  for  a 
proposed  cdf: 

n 

Ln  =  YJ\Fn(x)-F(x)\\  (8.4) 

1=1 

where  Fn(x)  and  F(x)  were  defined  in  the  previous  section.  Note  that  the  Least  Squares  Test  takes 
into  account  all  points  in  the  cdf. 


8.3  Hellinger  Distance 

The  Hellinger  Distance  yields  a  scalar  value,  H  that  can  quantify  the  distance  between  two  proba¬ 
bility  density  functions.  For  two  pdfs,  pi(x)  and  p2{x),  H  is  defined  as: 

/OO 

y/pi{x)p2{x)dx.  (8.5) 

-OO 

Note  that  0  <  H  <  1  where  H  —  0  o-  pi(x)  and  p2(x)  are  equal,  and  H  =  1  o-  pi(x)  and  p2(x) 
are  disjoint.  The  Hellinger  Distance  takes  into  account  is  computationally  inexpensive,  as  it  does 
not  require  converting  the  pdf  into  a  cdf.  In  addition,  the  Hellinger  Distance  accounts  for  the  entire 
distribution,  as  opposed  to  comparing  via  a  single-point  supremum. 
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8.4  Application  of  K-S  Test  and  Least  Squares  Test  on  Kernel 
Method  with  Synthetic  Data  from  Gaussian  Distribution 

We  will  begin  by  defining  ECDF  within  MAT  LAB.  We  will  use  the  following  function  in  MAT  LAB 
to  generate  the  ECDF. 

function  [  y  ]  =  ecdf_mod (  x_axis,  data  ) 
y=ones (size (x_axis) ) ; 
for  n=l : si ze (x_axis , 2 ) ; 

y (n) =  size (find (data<=x_axis (n) ) , 1) ; 
end 

y=y/ size (data, 1 ) ; 
end 

If  we  look  at  the  visual  comparison  between  the  ECDF,  the  Kernel  CDF,  and  the  Gaussian 
CDF,  there  is  almost  negligible  difference  between  them  (Figure  8.2).  If  we  zoom  in,  we  can  see 
the  differences  more  clearly  (Figure  8.3). 


Kernel  CDF  Comparison 


Figure  8.2:  Comparison  of  Kernel  CDF,  ECDF,  and  Gaussian  CDF. 
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Kernel  CDF  Comparison 


Figure  8.3:  Zoomed  in  Comparison  of  Kernel  CDF,  ECDF,  and  Gaussian  CDF 


We  will  run  the  following  code  to  create  a  table  to  compute  K-S  and  Least  Squares  statistics  in 
order  to  compare  the  Kernel  Method  to  the  Gaussian  CDF: 

sigma=.l;  mu=l;  x=sigma*randn (20000 , 1 ) +mu;  x=sort  (x) ; 
ksigma=2* (max (x) -min (x) ) / ( size  (x, 1 ) A (1/2) ) ; 
x_axis=linspace (min (x) -3*ksigma, max (x) +3*ksigma, 50000) ; 
kernel=zeros (size (x_axis) ) ; 
for  n=l : length (x) 
muk=x (n) ; 

f =1 /ksigma/ sqrt ( 2  *pi ) *exp ( - (x_axis-muk)  . A2 . / 2 /ksigma A2 )  ; 
kerne l=f +kernel ; 
end 

yy=l / ( sigma* sqrt (2*pi))*exp(-.5*( (x_axis-mu) / sigma) . A2 ) ; 

kernel_cdf=cumsum (kernel/sum (kernel) ) ; 

gauss_cdf =cumsum (yy /sum (yy ) ) ; 

y=ecdf_mod (x_axis , x) ; 

ls_stat_kernel=sum ( (y-kernell) .A2) ; 

ls_stat_gauss=sum ( (y-yyl )  . A2 )  ; 

ks_st at_kernel=max (abs (y-kernell) ) ; 

ks_stat_gauss=max (abs (y-yyl) ) ; 
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Trial 

Gaussian  K-S  Test 

Kernel  K-S  Test 

Gaussian  L.S.  Test 

Kernel  L.S.  Test 

1 

0.008730316 

0.003514486 

0.270325412 

0.043871312 

2 

0.004772498 

0.003290708 

0.082172717 

0,038146703 

3 

0.004390352 

0.003361794 

0.064362886 

0.03988743 

4 

0.006107503 

0.002899473 

0.206844817 

0.033283234 

5 

0.007465381 

0.003528832 

0.320147363 

0.04676351 

6 

0.005121885 

0.004373087 

0.115170686 

0.043326217 

7 

0.005520343 

0.003548023 

0.110128309 

0.038915194 

8 

0.004898304 

0.003086733 

0.045160592 

0.044495731 

9 

0.009677349 

0.003321411 

0.498252486 

0.044058385 

10 

0.006414521 

0.003207911 

0.127893274 

0.043694205 

11 

0.005622353 

0.003422341 

0.133494516 

0.045780496 

12 

0.005544909 

0.003300643 

0.066306315 

0.047292619 

13 

0.005325703 

0.003664077 

0.125827956 

0.042060459 

14 

0.004441213 

0.003270251 

0.082478346 

0.039470854 

15 

0.004667689 

0.003082527 

0.077155772 

0.039204603 

16 

0.005212049 

0.00298129 

0.108168645 

0.040054665 

17 

0.004781662 

0.002712938 

0.147263091 

0.034870995 

18 

0.004918177 

0.003511495 

0.075119744 

0.042089731 

19 

0.005429505 

0.002915812 

0.125410971 

0.03479343 

20 

0.004425386 

0.003148757 

0.075171545 

0.042178424 

Average: 

0.005673355 

0.003307129 

0.142842772 

0.04121191 

From  the  table,  we  can  see  that  the  Kernel  Method  Cumulative  Distribution  Function  repre¬ 
sents  the  solely  data-driven  ECDF  better  than  the  Gaussian  CDF  The  K-S  Test  and  Least  Squares 
Test  both  confirm  that  the  Kernel  Method  is  a  better  representation  of  the  empirical  cumulative 
distribution  function  for  the  data  than  the  distribution  from  which  the  data  was  selected. 


Figure  8.4:  Barakat  CDF  Implemented  on  Synthetic  Data. 
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Figure  8.5:  Zoomed  in  Comparison  of  Barakat  CDF,  ECDF,  and  Gaussian  CDF. 


Trial 

Gaussian  K-S  Test 

Barakat  K-S  Test 

Gaussian  L.S.  Test 

Barakat  L.S.  Test 

i 

0.003741717 

0.004782784 

0.010816655 

0.013854076 

2 

0.009100771 

0.004365288 

0.074162032 

0.017477223 

3 

0.00587217 

0.006620209 

0.029062625 

0.026011631 

4 

0.005310268 

0.004604807 

0.023837694 

0.013546505 

s 

0.009869642 

0.006506674 

0.070711684 

0.020875552 

6 

0.012072861 

0.007640935 

0.140883954 

0.033825081 

7 

0.00988999 

0.005497651 

0.097202076 

0.028887014 

8 

0.014180303 

0.009703637 

0.227673835 

0.081647998 

9 

0.009411214 

0.005504155 

0.039722674 

0.020847152 

10 

0.008335981 

0.00445205 

0.060843847 

0.016406816 

11 

0.008769523 

0.005991756 

0.082320883 

0.021333036 

12 

0.0084606 

0.007036387 

0.048399194 

0.029834825 

13 

0.006689778 

0.005098268 

0.047057498 

0.019582776 

14 

0.007631687 

0.004748362 

0.060000535 

0.017543446 

IS 

0.009348316 

0.003951488 

0.114133994 

0.014394307 

16 

0.004386149 

0.004872546 

0.012045822 

0.017690785 

17 

0.006988732 

0.004788965 

0.031973471 

0.017049715 

18 

0.007004244 

0.005483672 

0.055884698 

0.01862186 

19 

0.010853364 

0.004452651 

0.117317925 

0.016806959 

20 

0.008025353 

0.007026941 

0.040481445 

0.040422328 

Average: 

0.008297133 

0.005656461 

0.069226627 

0.024332954 

8.5  Hellinger  Distance  on  Synthetic  Data 

In  this  section  we  will  implement  all  three  approximation  techniques  and  compare  the  similarities 
between  the  pdfs  using  the  Hellinger  Distance.  The  synthetic  data  simulation  creates  50  data 
sets  of  synthetic  data  pulled  from  the  same  distribution.  We  implement  the  three  approximation 
techniques  and  compute  the  Hellinger  Distance  using  the  Gaussian  distribution  from  which  the 
data  was  pulled  as  the  baseline  pdf.  The  results  are  shown  below  in  depth:  (See  Appendix  for 
supporting  MAT  LAB  code) 


Barakat 

GMM 

Kernel 

1.2463358E-04 

4.4506809E-04 

3.4055791E-03 

Figure  8.6:  Table  of  Average  Hellinger  Distance  after  50  Simulations. 
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Barakat 

GMM 

Kernel 

0.000317342 

0.000864476 

0.003930431 

0.000159443 

0.00046888 

0.003617357 

0.000108236 

0.000413539 

0.003422314 

4.25E-05 

0.000256833 

0.003230477 

-0.000120685 

3.07E-05 

0.003145302 

-0.000207601 

-0.000121275 

0.00  32  1  885  8 

2.83E-05 

0.000255226 

0.003481929 

0.000265932 

0.000382234 

0.003052538 

6.90E-05 

0.000372775 

0.003276085 

6.53E-05 

0.000209794 

0.00368244 

0.000167666 

0.000475029 

0.003682658 

4.06E-05 

0.000408525 

0.003621717 

0.000145774 

0.000374245 

0.00335986 

0.000192333 

0.000578214 

0.003858138 

9.22E-05 

0.000367495 

0.003547981 

0.000191431 

0.000491128 

0.00372318 

0.000125497 

0.000523188 

0.003593297 

0.000135682 

0.000507496 

0.003644294 

0.000133122 

0.000313051 

0.003607424 

0.000176125 

0.000598791 

0.003342113 

-5.11E-06 

0.000344593 

0.003050142 

0.000148496 

0.000299137 

0.003502376 

9.00E-05 

0.000408509 

0.003485367 

0.000363634 

0.000586945 

0.00380843 

0.000201672 

0.000730999 

0.003403014 

0.000240903 

0.000504313 

0.003723927 

0.000125479 

0.000565985 

0.003338416 

0.000148184 

0.000615012 

0.002900943 

1.07E-05 

0.000416201 

0.003445834 

0.000334906 

0.000623258 

0.003617947 

0.000256256 

0.000751865 

0.003563757 

9.26E-05 

0.000450765 

0.003748495 

-7.90E-05 

0.000135497 

0.00282983 

0.000121118 

0.000478484 

0.003135431 

8.91E-05 

0.000194767 

0.003469072 

0.00013715 

0.000568589 

0.003253515 

2.27E-05 

0.0002712 

0.002931291 

5.64E-05 

0.000483626 

0.002998006 

7.51E-05 

0.000410133 

0.003232675 

0.000258302 

0.000738333 

0.00370976 

0.00030403 

0.00085435 

0.003713396 

0.000142529 

0.000551272 

0.003317579 

0.00021561 

0.000  5  9  4996 

0.004030471 

0.000128206 

0.000512315 

0.003055812 

-6.40E-06 

0.000342636 

0.002816854 

0.000154352 

0.000429945 

0.003961581 

0.000158084 

0.000500878 

0.003103454 

0.000165815 

0.000434945 

0.003549039 

2.26E-05 

0.000317952 

0.002799158 

0.000130099 

0.000365608 

0.002744995 

Figure  8.7:  Table  of  Full  Results  of  Hellinger  Distances  after  50  Simulations 
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Figure  8.8:  Visual  Comparison  of  the  Three  Approximation  Techniques  Implemented  on  Synthetic 
Data. 


We  can  see  from  the  Hellinger  Distance  data  in  the  table  above  that  the  Barakat  Method  was 
best  at  approximating  the  Gaussian  Curve.  Although  the  Kernel  Method  visually  represented  the 
data  better,  and  represented  the  ECDF  better,  the  Barakat  Method  was  optimal  for  producing  a  pdf 
most  similar  to  the  pdf  of  the  distribution  from  which  the  data  was  pulled. 
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Chapter  9 

Implementation  and  Comparison  of  PDF 
Approximation  Techniques  on  Laser  Data 


In  this  chapter  we  will  use  the  methods  described  in  the  previous  section  to  evaluate  laser  data  sets. 


9.1  Nighttime  Laser  Data 

The  first  laser  data  set  we  will  be  using  is  the  HeNe  laser  data  set  taken  in  a  nighttime,  outdoor 
environment.  The  impact  of  the  atmosphere  is  less  during  the  nighttime  because  the  light  from 
the  sun  is  not  interacting  with  the  photons  from  the  laser  light.  The  sensor  is  stationary,  375  me¬ 
ters  away,  and  recording  intensities  at  a  rate  of  10  kHz.  The  wavelength  of  the  laser  light  is  633  nm. 

We  will  implement  the  three  approximation  techniques  on  the  nighttime  data  sets,  and  use 
the  Hellinger  Distance  to  determine  which  method  is  best  for  approximating  the  pdf  of  the  laser 
data.  In  addition,  we  will  implement  a  visual  test  for  convergence  on  the  Barakat  Method  for  real 
laser  data.  In  regards  to  the  Barakat  Method,  the  (3  value  is  often  too  large  to  implement  the  method 
directly.  Therefore,  we  will  remove  the  mean  from  the  data,  and  implement  the  method  to  decrease 
the  (3  value.  Once  the  method  has  completed,  we  will  insert  the  mean  back  into  the  domain. 

We  will  begin  by  analyzing  two  data  sets  taken  in  the  nighttime  environment.  Figures  9.1  and 
9.3,  are  the  time  series  plot  of  the  two  data  sets  we  are  analyzing.  Figures  9.2  and  9.4  depict 
comparisons  of  the  three  approximation  techniques  discussed  in  the  previous  chapters.  We  can 
see  that  the  GMM  and  Kernel  Method  approximate  the  curve  of  the  histogram  remarkably  well. 
However,  the  Barakat  method  does  not  converge  to  an  accurate  approximation  to  the  pdf  for  the 
data.  In  Figure  9.5  we  can  see  the  progression  of  the  Barakat  pdf  through  increased  terms  (terms 
5-15  are  displayed).  Note  that  there  is  not  much  visual  difference  between  the  pdf  with  5  terms 
compared  to  the  pdf  with  15  terms. 
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Figure  9.1:  Time  Series  Function  of  Laser  Intensity  During  Nighttime  1. 
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Figure  9.2:  Visual  Comparison  of  the  Three  Approximation  Techniques  Implemented  on  Nighttime 
Data  1. 
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Figure  9.3:  Time  Series  Function  of  Laser  Intensity  During  Nighttime  2. 


42 


0.5  1  1.5  2  2.5  3  3.5  4  4.5 


Figure  9.4:  Visual  Comparison  of  the  Three  Approximation  Techniques  Implemented  on  Nighttime 
Data  2. 


Figure  9.5:  Visual  Representation  for  Convergence  of  Barakat  Method  for  Nighttime  Laser  Data 
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9.2  Daytime  Laser  Data 

The  final  laser  data  sets  we  will  be  using  is  the  HeNe  laser  data  set  taken  in  a  daytime,  outdoor 
environment.  The  impact  of  the  atmosphere  is  greater  during  the  daytime  because  the  light  from 
the  sun  is  now  interacting  with  the  photons  from  the  laser  light.  The  light  waves  from  the  sun 
dominate  the  light  waves  of  the  laser  light,  creating  less  fluctuations  in  intensity  at  the  target.  The 
sensor  is  stationary,  375  meters  away,  and  recording  intensities  at  a  rate  of  10  kHz.  The  wavelength 
of  the  laser  light  is  633  nm.  Figure  9.7  and  Figure  9.9  represent  visual  comparisons  of  the  three 
pdf  approximation  techniques  executed  on  daytime  laser  data. 

Paralleled  with  the  previous  secton,  we  will  now  analyze  two  data  sets  taken  in  the  daytime 
environment.  Figures  9.6  and  9.8,  are  the  time  series  plot  of  the  two  data  sets  we  are  analyzing. 
Figures  9.7  and  9.9  depict  comparisons  of  the  three  approximation  techniques  discussed  in  the 
previous  chapters.  Again,  we  can  see  that  the  GMM  and  Kernel  Method  approximate  the  curve  of 
the  histogram  remarkably  well  in  both  figures.  The  Barakat  method  also  converges  to  an  accurate 
pdf  approximation  in  Figure  9.7.  However,  the  Barakat  method,  in  Figure  9.9  does  not  converge 
to  any  pdf  (19  terms  were  used).  In  Figure  9.10  we  can  see  the  progression  of  the  Barakat  pdf 
through  increased  terms  (terms  5-15  are  displayed).  Note  that  the  pdf  does  not  appear  to  converge 
through  all  15  terms. 


Time  Series  Plot  of  Daytime  Laser  Data 
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Figure  9.6:  Time  Series  Function  of  Laser  Intensity  During  Daytime  1. 
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Figure  9.7:  Visual  Comparison  of  the  Three  Approximation  Techniques  Implemented  on  Daytime 
Data  1. 


Time  Series  Plot  of  Daytime  Laser  Data 


Figure  9.8:  Time  Series  Function  of  Laser  Intensity  During  Daytime  2. 
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Figure  9.9:  Visual  Comparison  of  the  Three  Approximation  Techniques  Implemented  on  Daytime 
Data  2. 
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Figure  9.10:  Visual  Representation  for  Convergence  of  Barakat  Method  for  Daytime  Laser  Data. 
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Chapter  10 

Solution  to  Paraxial  Wave  Equation  with 
Stochastic  Refraction  Coefficient 


Computing  a  solution  to  the  paraxial  wave  equation,  defined  in  Chapter  2,  is  non-trivial  and  of  great 
importance  when  attempting  to  model  the  behavior  of  the  laser  beam  at  a  target  once  propagated 
through  a  turbulent  medium.  From  Chapter  2, 


V(x,y,z)  = 


A 


exp(- 


r2  ,  ,  kr 2 


+  i(<j>  + 


v'A fTel  '  W(zY  "  2  F(z 


)), 


where  A0,  called  the  refraction  parameter,  and  @0,  the  diffraction  parameter,  are 

Ao'1_iv  Ao~w? 


(10.1) 


(10.2) 


o 


and  W  and  F,  which  are  the  spot-size  radius  of  the  beam  and  its  phase  radius  of  curvature  at  any 
z,  are 

W{z)  =  W0Ja2  +  QI  (j)  =  tan”1^  (10.3) 

v  Wo 

(©o  +  Ag)(6*0  -  1) 


F(z)  =  F0 


(10.4) 


©0  +  A-0  —  ©o) 

Note  that  10.1  results  in  a  plot  of  contours  of  the  intensity  of  the  laser  beam.  Since  our  data 
is  obtained  from  a  sensor  that  records  the  time  series  average  intensities  from  a  specified  area,  we 
will  average  the  values  of  V  to  obtain  a  single  time- series  data  point.  We  will  vary  our  refraction 
coefficient  k  as  such: 


k  =  £;0  +  7(2  +  ?7q,2  +  ?7/3),  (10.5) 

where  ko  is  fixed,  y(2  +  r)0 .  2  +  r]p)  is  a  random  variable  pulled  from  a  gamma  distribution  with 
parameters  a  =  2  +  r]a  and  f3  =  2  +  rjp.  From  experimenting  with  probability  density  functions 
of  laser  light  intensities  in  the  maritime  domain,  an  underlying  gamma  distribution  was  selected 
for  the  stochastic  parameter  k.  r)a  and  are  uniform  random  variables  to  increase  the  variance 
in  the  distribution,  which  accounts  for  the  extra  noise  we  see  in  the  data,  and  provides  an  addition 
parameter  to  vary  when  accounting  for  atmospheric  conditions.  We  will  run  a  time  series  simula¬ 
tion  that  uses  10.1  and  10.5  to  generate  simulated  time  series  data  (See  Figure  10.1).  We  will  then 
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implement  the  three  pdf  approximation  techniques  to  generate  a  pdf  of  the  simulated  data  (See 
Figure  10.2). 


Figure  10.1:  Simulated  Time  Series  Laser  Data. 
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Figure  10.2:  PDF  of  Simulated  Time  Series  Laser  Data. 
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Chapter  11 
Conclusion 


11.1  Barakat 

In  conclusion,  we  can  see  that  while  the  Barakat  Method  is  capable  of  modeling  the  synthetic  data 
remarkably  well,  it  cannot  always  account  for  the  noise  in  the  real  data.  In  addition,  the  beta  values 
from  the  Barakat  Method  that  result  from  the  real  data  sets  are  often  too  large  to  support  numerical 
approximations  computationally.  However,  this  can  be  offset  by  removing  the  mean  from  the  data, 
and  adding  it  back  once  the  method  has  been  implemented.  Regardless,  we  have  seen  instances  in 
which  the  Barakat  Method  still  does  not  converge  to  a  smooth  pdf. 


11.2  Kernel 

The  Kernel  Method  is  capable  of  representing  the  empirical  data  well,  but  does  not  yield  as  much 
information  about  the  underlying  distribution  from  which  the  data  is  pulled  (as  discovered  from 
the  synthetic  data  simulation).  In  addition,  the  Kernel  Method  is  computationally  very  strenuous. 
The  computational  power  can  be  minimized  by  not  using  every  data  point  in  the  data  set,  but 
rather  generated  from  a  user-defined  number  of  data  points  randomly  chosen  from  the  data  set.  In 
addition,  the  Kernel  Method  is  also  subjected  to  the  user’s  input  for  standard  deviation.  While  we 
have  found  that  6.3  is  a  good  rule  of  thumb  for  the  data  sets,  a  can  altered  by  the  user  to  make 
minor  adjustments  to  fit  the  approximate  pdf  to  the  histogram  curve. 


11.3  Gaussian  Mixture  Method 

Ultimately,  the  Gaussian  Mixture  Method  represented  the  data  set  well,  and  was  not  as  computa¬ 
tionally  strenuous.  However,  the  resulting  pdf  is  not  a  unique  solution,  because  it  depends  on  the 
initial  location  of  the  center  points  of  the  clusters.  In  addition,  it  requires  the  user  to  input  the  num¬ 
ber  of  clusters  beforehand.  Furthermore,  we  can  see  oscillations  at  the  peak  of  the  pdf  we  know 
from  the  histograms  that  these  oscillations  should  not  be  present.  Ultimately,  there  are  numerous 
variations  to  computing  a  pdf  via  GMM  (see  reference  [11]  for  examples  of  different  techniques 
of  GMM). 
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11.4  Summary 

In  essence,  if  the  data  set  is  unimodal,  and  the  beta  value  is  sufficiently  small,  the  Barakat  Method 
approximates  the  underlying  distribution  function  of  the  data  set  the  best.  The  method  is  effi¬ 
cient  and  most  similar  to  the  distribution  from  which  the  data  was  drawn.  However,  if  the  data 
is  complex,  we  have  seen  that  the  Barakat  method  can  sometimes  break  down  and  not  converge. 
Even  if  the  Barakat  method  does  converge,  it  could  still  not  represent  the  data  well  in  some  in¬ 
stances.  In  these  cases,  the  Gaussian  Mixture  Model  is  the  best  solution  to  approximating  the  pdf, 
when  evaluating  objectively  through  the  K-S,  Least  Squares,  and  Hellinger  Distance  tests.  It  is 
less  computationally  strenuous  than  the  Kernel  Method,  and  represents  the  underlying  distribution 
function  well,  by  the  Hellinger  Distance.  The  Kernel  Method  exhibits  less  oscillation,  and  results 
in  a  smoother  pdf,  but  does  does  not  represent  the  underlying  distribution  as  well  as  the  Barakat 
and  GMM  techniques. 
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Appendix 


Barakat  Function 

function  [  W,  barakat_pdf  ,  graph  ]  =  B  arakat_Function  (  laser, 
moments  ,  logic  ) 

laser=sort (laser ) ; 
min_laser=min ( laser  )  ; 
laser  =  laser—  min_laser  ; 

I=laser/ mean (laser)  ; 

beta=mean(  laser  )  A2/(mean(  laser  .  A2)—  mean(  laser  )  A2)  ; 

NN=moments  ; 

WW=0; 

for  N=3:NN 

term=Barakat_W_n2  (  beta  ,  I,  N)  .  *  Laguerre_Beta  (  beta  ,  beta*I  ,  N) 

9 

W^VW+term  ; 

end 

W=Barakat_W_g2  (  beta  ,  I).*(1+WW); 

barakat_pdf=abs  (W)  /trapz(laser  ,abs  (W) )  ; 
laser  =  laser  +  min_laser  ; 
if  logic  ==1 

graph  =  plot(laser  ,barakat_pdf  ,  ’  c  ’  Line  Width  ’  ,2)  ; 
t  i  1 1  e  ( [  ’  B  arakat  Method  with  ’  ,  num2str  (NN)  ,  ’  moments’]) 

else 

graph=NaN ; 

end 

end 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

1 

2 

3 

4 

5 

6 

7 

8 

9 


52 


ECDF  Computation  Function 

function  [  y  ]  =  ecdf_mod(  x_axis  ,  data  ) 
%UNTITLED6  Summary  of  this  function  goes  here 
%  Detailed  explanation  goes  here 

y=ones(size(x_axis))  ; 

for  n  =  1 :  s ize  (  x_axis  ,  2 )  ; 

y(n)  =  size(find(data<=x_axis(n))  ,1)  ; 

end 

y=y  /  size  (  data  ,  1 )  ; 
end 

function  [  y  ]  =  ECDF_data(  laser  ) 

%UNTITLED11  Summary  of  this  function  goes  here 
%  Detailed  explanation  goes  here 

y  =  l :  size  ( laser  ,  1 )  ; 
y=y / size  (laser  ,1)  ; 


end 
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Final  PDF  Comparison 

for  n  =  l :  1 

clear  all 
clc 

data=.l*randn(10000,l)+l; 
data=  s  ort  (  data  )  ; 

gauss  =  l/sqrt(2*pi)/.l*exp(—  (data—  1).A2/(2*.1A2)); 
%Compute  QVM 

[  xg ,  yg  ,  graph  ]  =  Gaussian_Mixture  (  data,  13,  10  ); 
hold  on 

%Compute  Kernel 

[xk,  yk  ,  graph  ]=  Kernel_Method_function  (  data  )  ; 

%Compute  Barakat 

[  W,  barakat_pdf  ,  graph  ]  =  B  arakat_Function  (  data,  5  ); 

t  i  1 1  e  (  ’  S  y  nthetic  Data  Comparison’) 

legend( ’Histogram’  ,  ’GMM’  ,  ’  Kernel  ’  ,  ’  Barakat  ’  ) 


end 
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Gaussian  Mixture  Method  Function 

function  [  x,  y,  graph  ]  =  Gaussian_Mixture  (  laser  ,  k,  N,  logic  ) 
%laser  is  the  data 
%k  is  number  of  clusters 
9£N  is  number  of  iterations 

%load  (  ’  LaserData  \  HeNe_50dB_gain_375meters_1024_USNA_4  ’ ) 

%laser  =  ConvertedData.Data.  MeasuredData  (1  ,4)  .  Data  ; 

%  mu=  1 ;  sigma  =  .  1 ; 

%  laser  =  sigma*randn  (100000, 1  )+mu; 
laser=sort (laser ) ; 

if  logic ==1 

[hist_y  ,  hist_x  ]  =  hist  (laser  ,200)  ; 
hist_y  =  hist_y  /  trapz  (hist_x  ,  hist_y  )  ; 
bar(hist_x  ,hist_y) 
hold  on 

end 


delt  ax  =2*  min  ( [  max(  laser  )— mean  (laser)  ,  mean(  laser  )— min  (laser)])  ; 

centers  =  deltax*rand(k,l)  +mean (laser)  — deltax  ; 
centers  =  linspace  (min  ( laser  )  ,  max(  laser)  ,  k) 
centers  =  sort  (  centers  )  ; 

m=zeros (k ,N)  ; 
for  i  t  e  r  =  1  :N 
a  =  l ; 

ind  =  zeros  (k  ,  1 )  ; 
sigma=zeros  (k  ,  1 )  ; 
m( :  ,  iter  )  =  centers  ; 
for  n  =  1  : k— 1 

upp  =  (  centers  (n  +  l)  +  centers  (n) )  / 2 ; 
b=find(  laser  >upp  ,  1 )  ; 
centers  (n)  =mean  (laser(a:b))  ; 
ind (n)=b— a ; 

sigma(n)=(laser(b)  —  laser(a))/2; 
a=b ; 

end 


centers (k)=mean (laser(b:end)); 
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sigma  (k)  =(  max  (  laser)— laser  (b))/2; 
ind (k)=size ( laser  ,1)— b; 
centers  =  centers(~isnan(  centers)); 
k=size(centers  ,1); 


end 

weigh  ts  =  ind/(  size  (laser  ,1)— 1); 
for  n  =  l  :k 

%plot  steady  state  center  points  of  clusters 
%plot (m(n  , : ) ) 

%hold  on 

end 

%x=  linsp  ac  e  ( min  (laser)  —4*  sigma  ( 1 )  ,max(  laser)  +4*sigma(k)  ,10000); 
x=sort (laser )  ; 

y=zeros ( size (x) ) ; 
for  n  =  1 : k 

if  sigma(n)~=0 

y=y+ weights  (n)/sigma(n)  /  sqrt (2*  pi )  *exp  (— ((x— centers  (n)) 
.A2)/(2*sigma(n)A2))  ; 

end 

end 

%fi  gure 
if  logic  ==  1 

graph  =  plot(x,y,  ’  r  ’  Line  Width  ’  ,2)  ; 

t  i  1 1  e  ([’ Gaussian  Mixture  Model  with  ’  ,  num2str  (k)  ,  ’  Clusters’]) 
else 

graph=NaN ; 

end 


end 
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Hellinger  Distance 

1  function  [  h  ]  =  Hellinger(  x,  pdfl  ,  pdf2  ) 

2  %UNTITLED  Summary  of  this  function  goes  here 

3  %  Detailed  explanation  goes  here 

4 

5  fg  =  sqrt  (pdfl  .*  pdf2  )  ; 

6 

7  h=l— trapz (x , fg ) ; 

8 

9  end 
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Kernel  Method  Function 

function  [  x,  y,  graph  ]  =  Kernel_Method_function  (  laser  ,  logic  ) 
%  laser  =  data 

sigma  =  2*(max(  laser  )— min(  laser))  /  sqrt(length(laser))  ; 
%x=linspace  (min(  laser  )— 3* sigma  ,max(  laser  ) +3* sigma  ,10000)  ; 
x=sort(laser) ; 
y=zeros ( size (x) ) ; 

h=waitbar  (0  ,’ Starting  Kernel  Method’); 
for  n  =  l : length (  laser  ) / 1 00  —  1 

waitbar  (n/(  length  (  laser  )  / 1 00  —  1)  ,h  Loading  Kernel  Method’ 

) 

mu=laser (100*n) ; 

f  =  l/  sigma/sqrt(2*pi)  *exp(  —  (x— mu)  .A2./2  /  sigmaA2)  ; 

y=f +y ; 

end 

close (h) 

y=y /(  length  /  laser  )/l 00  — 1) ; 
if  logic  ==  1 

graph  =  plot  (x  ,  y  ,  ’  g  ’  ,  ’  Line  Width  ’  ,2)  ; 
t  i  1 1  e  (’ Kernel  Method’) 

else 

graph=NaN ; 

end 


end 
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PDF  to  CDF  Converting  Function 

function  [  cdf  ]  =  pdf_to_cdf  (  x,  pdf  ) 

%UNTITLED13  Summary  of  this  function  goes  here 
%  Detailed  explanation  goes  here 
cdf=zeros( size (pdf)) ; 

h=waitbar  (0  Converting  PDF  to  CDF’); 
for  n  =  2:  size  (x  ,  1 ) 

waitbar  (n/(  size  (x  ,  1 ))  ,h  Converting  PDF  to  CDF’) 
cdf(n)=trapz(x(l:n) ,pdf(l:n)); 

end 

close (h) 
cdf ( end )  =  1 ; 


